Multi-Player Residual Advantage Learning With General Function Approximation

نویسنده

  • Mance E. Harmon
چکیده

A new algorithm, advantage learning, is presented that improves on advantage updating by requiring that a single function be learned rather than two. Furthermore, advantage learning requires only a single type of update, the learning update, while advantage updating requires two different types of updates, a learning update and a normilization update. The reinforcement learning system uses the residual form of advantage learning. An application of reinforcement learning to a Markov game is presented. The test-bed has continuous states and nonlinear dynamics. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. On each time step, each player chooses one of two possible actions; turn left or turn right, resulting in a 90 degree instantaneous change in the aircraft’s heading. Reinforcement is given only when the missile hits the plane or the plane reaches an escape distance from the missile. The advantage function is stored in a single-hidden-layer sigmoidal network. Speed of learning is increased by a new algorithm, Incremental Delta-Delta (IDD), which extends Jacobs’ (1988) Delta-Delta for use in incremental training, and differs from Sutton’s Incremental Delta-Bar-Delta (1992) in that it does not require the use of a trace and is amenable for use with general function approximation systems. The advantage learning algorithm for optimal control is modified for games in order to find the minimax point, rather than the maximum. Empirical results gathered using the missile/aircraft test-bed validate theory that suggests residual forms of reinforcement learning algorithms converge to a local minimum of the mean squared Bellman residual when using general function approximation systems. Also, to our knowledge, this is the first time an approximate second order method has been used with residual algorithms. Empirical results are presented comparing convergence rates with and without the use of IDD for the reinforcement learning test-bed described above and for a supervised learning test-bed. The results of these experiments demonstrate IDD increased the rate of convergence and resulted in an order of magnitude lower total asymptotic error than when using backpropagation alone.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Residual Advantage Learning Applied to a Differential Game

An application of reinforcement learning to a differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual form of advantage learning. The game is a Markov decision process (MDP) with continuous states and nonlinear dynamics. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the miss...

متن کامل

Residual Algorithms: Reinforcement Learning with Function Approximation

A number of reinforcement learning algorithms have been developed that are guaranteed to converge to the optimal solution when used with lookup tables. It is shown, however, that these algorithms can easily become unstable when implemented directly with a general function-approximation system, such as a sigmoidal multilayer perceptron, a radial-basisfunction system, a memory-based learning syst...

متن کامل

Adaptive Bases for Reinforcement Learning

We consider the problem of reinforcement learning using function approximation, where the approximating basis can change dynamically while interacting with the environment. A motivation for such an approach is maximizing the value function fitness to the problem faced. Three errors are considered: approximation square error, Bellman residual, and projected Bellman residual. Algorithms under the...

متن کامل

Advantage Updating Applied to a Differrential Game

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual gradient form of advantage updating. The game is a Markov Decision Process (MDP) with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile a...

متن کامل

An Unsupervised Learning Method for an Attacker Agent in Robot Soccer Competitions Based on the Kohonen Neural Network

RoboCup competition as a great test-bed, has turned to a worldwide popular domains in recent years. The main object of such competitions is to deal with complex behavior of systems whichconsist of multiple autonomous agents. The rich experience of human soccer player can be used as a valuable reference for a robot soccer player. However, because of the differences between real and simulated soc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996